skip to main content


Search for: All records

Creators/Authors contains: "Shrestha, Sandesh"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 1, 2024
  2. Abstract Motivation

    Developing new crop varieties with superior performance is highly important to ensure robust and sustainable global food security. The speed of variety development is limited by long field cycles and advanced generation selections in plant breeding programs. While methods to predict yield from genotype or phenotype data have been proposed, improved performance and integrated models are needed.

    Results

    We propose a machine learning model that leverages both genotype and phenotype measurements by fusing genetic variants with multiple data sources collected by unmanned aerial systems. We use a deep multiple instance learning framework with an attention mechanism that sheds light on the importance given to each input during prediction, enhancing interpretability. Our model reaches 0.754 ± 0.024 Pearson correlation coefficient when predicting yield in similar environmental conditions; a 34.8% improvement over the genotype-only linear baseline (0.559 ± 0.050). We further predict yield on new lines in an unseen environment using only genotypes, obtaining a prediction accuracy of 0.386 ± 0.010, a 13.5% improvement over the linear baseline. Our multi-modal deep learning architecture efficiently accounts for plant health and environment, distilling the genetic contribution and providing excellent predictions. Yield prediction algorithms leveraging phenotypic observations during training therefore promise to improve breeding programs, ultimately speeding up delivery of improved varieties.

    Availability and implementation

    Available at https://github.com/BorgwardtLab/PheGeMIL (code) and https://doi.org/doi:10.5061/dryad.kprr4xh5p (data).

     
    more » « less
  3. Abstract

    The development of next-generation sequencing (NGS) enabled a shift from array-based genotyping to directly sequencing genomic libraries for high-throughput genotyping. Even though whole-genome sequencing was initially too costly for routine analysis in large populations such as breeding or genetic studies, continued advancements in genome sequencing and bioinformatics have provided the opportunity to capitalize on whole-genome information. As new sequencing platforms can routinely provide high-quality sequencing data for sufficient genome coverage to genotype various breeding populations, a limitation comes in the time and cost of library construction when multiplexing a large number of samples. Here we describe a high-throughput whole-genome skim-sequencing (skim-seq) approach that can be utilized for a broad range of genotyping and genomic characterization. Using optimized low-volume Illumina Nextera chemistry, we developed a skim-seq method and combined up to 960 samples in one multiplex library using dual index barcoding. With the dual-index barcoding, the number of samples for multiplexing can be adjusted depending on the amount of data required, and could be extended to 3,072 samples or more. Panels of doubled haploid wheat lines (Triticum aestivum, CDC Stanley x CDC Landmark), wheat-barley (T.aestivumxHordeum vulgare) and wheat-wheatgrass (Triticum durum x Thinopyrum intermedium) introgression lines as well as known monosomic wheat stocks were genotyped using the skim-seq approach. Bioinformatics pipelines were developed for various applications where sequencing coverage ranged from 1 × down to 0.01 × per sample. Using reference genomes, we detected chromosome dosage, identified aneuploidy, and karyotyped introgression lines from the skim-seq data. Leveraging the recent advancements in genome sequencing, skim-seq provides an effective and low-cost tool for routine genotyping and genetic analysis, which can track and identify introgressions and genomic regions of interest in genetics research and applied breeding programs.

     
    more » « less
  4. Abstract

    The wheat wild relativeAegilops tauschiiwas previously used to transfer theLr42leaf rust resistance gene into bread wheat.Lr42confers resistance at both seedling and adult stages, and it is broadly effective against all leaf rust races tested to date.Lr42has been used extensively in the CIMMYT international wheat breeding program with resulting cultivars deployed in several countries. Here, using a bulked segregant RNA-Seq (BSR-Seq) mapping strategy, we identify three candidate genes forLr42. Overexpression of a nucleotide-binding site leucine-rich repeat (NLR) gene AET1Gv20040300 induces strong resistance to leaf rust in wheat and a mutation of the gene disrupted the resistance. TheLr42resistance allele is rare inAe. tauschiiand likely arose from ectopic recombination. Cloning ofLr42provides diagnostic markers and over 1000 CIMMYT wheat lines carryingLr42have been developed documenting its widespread use and impact in crop improvement.

     
    more » « less
  5. Abstract

    Breeding programs for wheat (Triticum aestivumL.) and other crops require one or more generations of seed increase before replicated trials can be sown to assess yield. Extensive phenotyping at this stage is challenging because of the small sizes of plots and large numbers of lines under evaluation, and therefore, breeders typically rely on visual selection to promote lines to yield evaluation. Aerial high‐throughput phenotyping (HTP) enables the rapid acquisition of traits that may be useful for selection among early generation lines. With the objective of assessing the potential for aerial measurements recorded on seed increase plots to improve indirect selection for grain yield (GY), two sets of 1,008 early generation bread wheat breeding lines were sown both as replicated yield trials (YTs) and as small, unreplicated plots (SPs) at the International Maize and Wheat Improvement Center during two breeding cycles. Normalized difference vegetation indices (NDVI) collected with an unmanned aerial vehicle (UAV) in the SPs were observed to be heritable and moderately correlated with GY assessed in YTs. Furthermore, NDVI was more predictive of GY than univariate genomic selection (GS), with still higher overall predictive abilities from multitrait approaches. A related experiment showed that selection based on NDVI would have outperformed visual selection, though this approach would have driven a directional response in phenology because of confounding between phenology, NDVI, and GY. A restricted selection index was proposed to address this issue. These results provide a promising outlook for the use of aerial HTP to improve selection at the early generation, seed‐limited stages of breeding programs.

     
    more » « less